Skip to content

RC docs-sync: daily Claude-authored doc PRs (experimental rollout, wordpress-seo)#388

Merged
enricobattocchi merged 5 commits intomainfrom
rc-docs-sync-proposal
Apr 28, 2026
Merged

RC docs-sync: daily Claude-authored doc PRs (experimental rollout, wordpress-seo)#388
enricobattocchi merged 5 commits intomainfrom
rc-docs-sync-proposal

Conversation

@enricobattocchi
Copy link
Copy Markdown
Member

@enricobattocchi enricobattocchi commented Apr 24, 2026

Summary

Ships an experimental daily workflow that polls Yoast/wordpress-seo for new RC tags, runs a Claude agent against the developer-portal docs, and opens draft PRs when docs need updating. Going live for Yoast SEO (free) first — once real RCs have run through it and the output looks good, we expand to the other documented products.

What this PR adds

  • AGENT_MAP.md (repo root) — source of truth for the feature-area taxonomy: docs paths per area, per-product source-path globs, symbol namespaces, and the Product → source repo table (with human-readable display names). The agent loads this at runtime to triage an RC diff and route proposed changes into PRs.
  • .github/claude-agent/run.md — the agent orchestration prompt. Triage + authoring + PR-creation flow, style rules, authoring discipline, PR title format, and the machine-readable marker for the run-summary comment (which doubles as state for the next run). Includes a coverage-gap self-reporting step so every RC run surfaces public surface observed in the diff but not yet covered by AGENT_MAP.md.
  • .github/workflows/rc-docs-sync.yml — the daily workflow. Two jobs: a resolve job that builds the queue of (product, rc_tag) pairs to process, and a process job that fans out as a matrix and invokes the Claude agent per item. schedule: 0 6 * * * + workflow_dispatch for manual backfill.

How it works (per daily run)

  1. For each opted-in product (just wordpress-seo right now), query the product repo's tags via the public GitHub API — anonymous, no token.
  2. Read the tracking issue's comments on this repo, looking for the most recent machine-readable marker:
    <!-- rc-docs-sync:v1 product=wordpress-seo rc_tag=<tag> -->
    
    That tag is the "last processed." If none is found, seed one for the current latest RC and process nothing historically (first-run safety).
  3. For each RC tag newer than the marker (sorted by version), in chronological order:
    • Shallow-clone Yoast/wordpress-seo at the RC tag (anonymous).
    • Compute full + noise-filtered diffs against the previous stable release.
    • Extract the product's readme.txt changelog entry.
    • Build a symbol index from the current docs/ tree.
    • Invoke the Claude agent (anthropics/claude-code-action@v1, model claude-sonnet-4-6) with .github/claude-agent/run.md. The agent triages, opens one PR per affected feature area (branch rc-sync/<product>/<rc_tag>/<area>, title Yoast SEO <base-version> — docs(<area>): <title>), surfaces any AGENT_MAP.md coverage gaps it observed, and posts a summary comment back to the tracking issue. The comment's marker is the state for tomorrow's run.
  4. If the filtered diff is empty (only tests/translations/lockfiles changed), post a one-liner no-op comment and move on — no PR, no agent invocation.

Why this architecture

  • No GitHub App, no PAT, no cross-repo secrets. Product repos are public → anonymous cloning; writes to this repo use the built-in GITHUB_TOKEN. The only external secret is ANTHROPIC_API_KEY.
  • Never writes to main. State lives in tracking-issue comments (not a committed state file), so the workflow is indifferent to main being protected.
  • Cloudflare Pages preview is the PR check. The agent doesn't re-run yarn build locally — broken Docusaurus builds fail the CF Pages deploy and surface on the PR.
  • No auto-merge. GITHUB_TOKEN can't approve PRs, and the workflow never calls gh pr merge. Human reviewer is the sole merge gate.

Validation

Three manual spikes were run against past RCs of Yoast/wordpress-seo:

Spike Case Result
A Narrow positive — wpseo_llmstxt_link_description filter in 26.3-RC1 1 PR plan, area llms-txt, authoring semantically matched the ground-truth doc PR.
N Negative — 26.1.1-RC1 bugfix hotfix Correctly returned 0 PR plans with a clean "no doc changes needed" rationale.
C Hard positive — Schema Aggregator in 27.1-RC1 (216 new source files, 11+ new filters, new REST + CLI surfaces) 2 scoped PR plans: schema-aggregator (3 new docs + sidebars.js) and a bonus catch for robots-txt (Schemamap directive + wpseo_disable_robots_schemamap filter). The robots-txt update is something the team had to address in follow-up doc commits after the original schema-aggregator PR — the agent would have caught it at RC time.

End-to-end dry-runs of the orchestration prompt on Spikes A and C produced the expected proposed-docs.patch, proposed-sidebars.patch, and run-summary.md artifacts (with correct marker, correct PR-title format, correct area placement).

Activation status

Step Status
Tracking issue created and pinned #390
Repo variable TRACKING_ISSUE_WORDPRESS_SEO = 390
Repo secret ANTHROPIC_API_KEY
Real Claude agent invocation wired (anthropics/claude-code-action@v1, claude-sonnet-4-6, narrow allowlist)
Manual workflow_dispatch validation against a past RC ⏳ pending — to be run against 27.6-RC1 once that tag lands

Known follow-ups (not blocking)

  • Diff base optimization: today the workflow always diffs an RC against the latest stable release, so iterative RCs (RC2, RC3 of the same base version) re-process the same content. Sketched refinement: diff against the latest already-processed RC of the same base version when one exists. Halves cost on iterative RCs and prevents duplicate-PR noise.
  • PRODUCTS dict in the workflow currently lists only wordpress-seo. Adding a product = add slug + source repos + display name + tracking-issue var name there, plus create its tracking issue + repo variable.

Rollout plan

  • V1 (this PR): wordpress-seo only.
  • V2: once 2–3 RCs look good, add wordpress-seo-premium and wordpress-seo-local.
  • V3+: the remaining documented products — wpseo-news, wpseo-video, wpseo-woocommerce, shopify-seo, duplicate-post. Each is a pure config addition.

🤖 Planning and artifact drafting assisted by Claude.

Adds the plumbing for a daily GitHub Action that polls Yoast product repos
for new RC tags, runs a Claude agent against the developer-portal docs, and
opens draft PRs where doc updates are warranted.

Scope for this first phase: Yoast SEO (free) only. More products added
iteratively by extending AGENT_MAP.md and the PRODUCTS dict in the workflow.

Architecture:
- No GitHub App or PAT required; product repos are public so anonymous
  cloning works and all writes to this repo use GITHUB_TOKEN.
- Never writes to main; state lives in tracking-issue comments, identified
  by a machine-readable marker embedded in every run-summary comment.
- Cloudflare Pages preview deploy on PR is the per-PR validation.
- PRs are never auto-merged; branch protection's PR-review rule is the gate.

Validated through three manual spikes (narrow positive, negative hotfix,
multi-file new feature) plus end-to-end dry-runs of the orchestration prompt.

Activation requirements (handled post-merge, see PR body): create a tracking
issue, set TRACKING_ISSUE_WORDPRESS_SEO repo variable, set ANTHROPIC_API_KEY
secret (coordinated with devops).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 24, 2026

Deploying yoast-developer with  Cloudflare Pages  Cloudflare Pages

Latest commit: 075164f
Status: ✅  Deploy successful!
Preview URL: https://1a0ce4d1.yoast-developer.pages.dev
Branch Preview URL: https://rc-docs-sync-proposal.yoast-developer.pages.dev

View logs

enricobattocchi and others added 3 commits April 24, 2026 17:18
Two refinements in response to review:

- Agent now detects and reports "AGENT_MAP.md coverage gaps" during every RC
  run. A coverage gap is a hunk that looks like public surface (new
  apply_filters / do_action, new REST route, new top-level src/<subsystem>/
  file with public classes) whose path or symbol isn't covered by any area's
  source_paths or symbol_namespaces. Listed in the run-summary comment under
  a "Coverage gaps observed" section only when present. Informational; does
  not block the run. Turns every RC into a free audit of the map.

- Removed AI Brand Insights from AGENT_MAP.md's Product table and from the
  `ai` area's product list. Rationale: the developer portal currently has
  no feature-spec docs for it (only a changelog), so every docs-sync run on
  the product would reliably produce zero PRs. Keeping it in the map would
  waste compute and review attention on unambiguously no-op runs. Added a
  note in the `ai` area describing how to re-add it (product table entry
  plus source paths for ai-insights-api and ai-insights-frontend, plus a
  split-product rule) when/if feature docs land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the "off-by-default" flag from the duplicate-post area. It has docs
in this repo (docs/duplicate-post/**) and an active release cadence in
Yoast/duplicate-post, so there's no reason to treat it differently from
any other documented product.

Simplify the agent's "never touch" rule to docs/development/** only —
the per-area docs_paths matching already ensures docs/duplicate-post/ is
only touched when PRODUCT=duplicate-post.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The product has no feature docs in this repo and will not gain any, so
per-product notes about how to re-integrate it later are noise. Replaced
the two AI-Brand-Insights-specific notes (Product table + ai area) with
a generic rule stating only products with feature docs belong in the
table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the placeholder shell line with an anthropics/claude-code-action@v1
step. To make it possible to use a `uses:` step inside the per-RC processing
loop, the workflow is reorganized into two jobs:

  - resolve: runs the Python queue-resolution + first-run seeding logic and
    emits the queue as a JSON output for downstream matrix consumption.

  - process: matrix over the queue (max-parallel=1 to keep PR creation
    serial for now), one job per (product, rc_tag) pair. Each job clones
    the source repo at the RC and previous-release tags, builds the diff
    bundle + symbol index + changelog source, then either posts a no-op
    summary (if the filtered diff is empty) or invokes the Claude agent.

The agent step:
  - Reads .github/claude-agent/run.md as its instructions.
  - Has env vars PRODUCT, RC_TAG, DISPLAY_NAME, BUNDLE_DIR, TRACKING_ISSUE,
    PREV_RELEASE, WORKFLOW_RUN_URL set explicitly.
  - Has a narrow allowed-tools list: Read/Grep/Glob/Edit/Write on the
    workspace, plus Bash for git, gh, and a handful of read-only utilities.
  - Defaults to claude-sonnet-4-6, max-turns 50.

GITHUB_TOKEN is picked up automatically by the action, so the agent can
push branches, open PRs, label them, and post issue comments via gh
without further configuration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@enricobattocchi enricobattocchi marked this pull request as ready for review April 28, 2026 14:23
@enricobattocchi enricobattocchi changed the title Proposal: RC docs-sync (daily poll + AI-drafted doc PRs) RC docs-sync: daily poll + AI-drafted doc PRs Apr 28, 2026
@enricobattocchi enricobattocchi changed the title RC docs-sync: daily poll + AI-drafted doc PRs RC docs-sync: daily Claude-authored doc PRs (experimental rollout, wordpress-seo) Apr 28, 2026
@enricobattocchi enricobattocchi merged commit 4fa8756 into main Apr 28, 2026
1 check passed
@enricobattocchi enricobattocchi deleted the rc-docs-sync-proposal branch April 28, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant